尽管在机器学习中无处不在使用随机优化算法,但这些算法的确切影响及其对现实的非凸位设置中的概括性能的动态仍然知之甚少。尽管最近的工作揭示了随机优化中的概括与重尾行为之间的联系,但这项工作主要依赖于连续的近似值。对于原始离散时间迭代的严格处理尚未进行。为了弥合这一差距,我们提出了新颖的界限,将概括与在离散时间和连续时间设置中围绕局部最小值相关联的过渡内核的下尾指数。为了实现这一目标,我们首先证明了根据应用于优化器轨迹的著名的fernique-talagrand功能绑定的数据和算法依赖性的概括。然后,我们通过利用随机优化器的马尔可夫结构,并根据其(数据依赖性)过渡内核来得出界限来擅长于此结果。我们通过各种神经网络的经验结果来支持我们的理论,显示了概括误差与较低尾声之间的相关性。
translated by 谷歌翻译
在他们的损失景观方面观看神经网络模型在学习的统计力学方法方面具有悠久的历史,并且近年来它在机器学习中得到了关注。除此之外,已显示局部度量(例如损失景观的平滑度)与模型的全局性质(例如良好的泛化性能)相关联。在这里,我们对数千个神经网络模型的损失景观结构进行了详细的实证分析,系统地改变了学习任务,模型架构和/或数据数量/质量。通过考虑试图捕获损失景观的不同方面的一系列指标,我们证明了最佳的测试精度是如下:损失景观在全球连接;训练型模型的集合彼此更像;而模型会聚到局部平滑的地区。我们还表明,当模型很小或培训以较低质量数据时,可以出现全球相连的景观景观;而且,如果损失景观全球相连,则培训零损失实际上可以导致更糟糕的测试精度。我们详细的经验结果阐明了学习阶段的阶段(以及后续双重行为),基本与偶然的决定因素良好的概括决定因素,负载样和温度相同的参数在学习过程中,不同的影响对模型的损失景观的影响不同和数据,以及地方和全球度量之间的关系,近期兴趣的所有主题。
translated by 谷歌翻译
最近引入的普通微分方程网络(ODE-网)在深度学习和动态系统之间建立了丰富的连接。在这项工作中,我们使用基础函数的线性组合重新考虑重量作为连续的函数,这使我们能够利用诸如功能投影的参数变换。反过来,这个视图允许我们制定处理有状态层的新型有状态ode-块。这个新的ode-块的好处是双重的:首先,它使得能够纳入有意义的连续深度批量归一代化层以实现最先进的性能;其次,它使得能够通过改变来压缩权重,而不会再培训,同时保持近最先进的性能并降低推理时间和存储器占用。使用卷积单元和(b)使用变压器编码器单元将(b)句子标记任务应用于(a)图像分类任务来证明性能。
translated by 谷歌翻译
我们为研究通过将噪声注入隐藏状态而训练的经常性神经网络(RNN)提供了一般框架。具体地,我们考虑RNN,其可以被视为由输入数据驱动的随机微分方程的离散化。该框架允许我们通过在小噪声制度中导出近似显式规范器来研究一般噪声注入方案的隐式正则化效果。我们发现,在合理的假设下,这种隐含的正规化促进了更平坦的最小值;它偏向具有更稳定动态的模型;并且,在分类任务中,它有利于具有较大分类余量的模型。获得了全局稳定性的充分条件,突出了随机稳定的现象,其中噪音注入可以在训练期间提高稳定性。我们的理论得到了经验结果支持,证明RNN对各种输入扰动具有改善的鲁棒性。
translated by 谷歌翻译
As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.
translated by 谷歌翻译
Drawing from the resources of psychoanalysis and critical media studies, in this paper we develop an analysis of Large Language Models (LLMs) as automated subjects. We argue the intentional fictional projection of subjectivity onto LLMs can yield an alternate frame through which AI behaviour, including its productions of bias and harm, can be analysed. First, we introduce language models, discuss their significance and risks, and outline our case for interpreting model design and outputs with support from psychoanalytic concepts. We trace a brief history of language models, culminating with the releases, in 2022, of systems that realise state-of-the-art natural language processing performance. We engage with one such system, OpenAI's InstructGPT, as a case study, detailing the layers of its construction and conducting exploratory and semi-structured interviews with chatbots. These interviews probe the model's moral imperatives to be helpful, truthful and harmless by design. The model acts, we argue, as the condensation of often competing social desires, articulated through the internet and harvested into training data, which must then be regulated and repressed. This foundational structure can however be redirected via prompting, so that the model comes to identify with, and transfer, its commitments to the immediate human subject before it. In turn, these automated productions of language can lead to the human subject projecting agency upon the model, effecting occasionally further forms of countertransference. We conclude that critical media methods and psychoanalytic theory together offer a productive frame for grasping the powerful new capacities of AI-driven language systems.
translated by 谷歌翻译
Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
有限混合物建模是聚类领域的一种流行方法,并且在很大程度上是由于其软聚类成员资格概率所致。但是,EM算法是适合有限混合模型的最常见算法,是许多问题的受害者。我们解决了使用有限混合模型的困扰聚类的这些问题,包括在高维情况下与局部最大值和算法速度问题相对应的解决方案的收敛。这是通过开发两种新型算法来完成的,这些算法结合了数据矩阵的光谱分解和非参数bootstrap采样方案。模拟显示了我们的算法的有效性,不仅证明了它们的灵活性,而且还证明了与其他(自举)聚类算法相比,它们避免了与局部墨西哥相对应的溶液的能力。我们的新型算法通常具有更一致的收敛标准,并且在适合有限混合模型的其他自举算法中,速度显着提高。
translated by 谷歌翻译
本文考虑了从野外单视图像中无监督的3D对象重建的问题。由于歧义性和内在的不良性,这个问题本质上难以解决,因此需要强大的正则化以实现不同潜在因素的分离。与现有的作品将明确的正规化引入目标功能不同,我们研究了一个不同的空间进行隐式正则化 - 潜在空间的结构。具体而言,我们限制了潜在空间的结构,以捕获潜在因素的拓扑因果排序(即代表因果关系作为定向无环形图)。我们首先表明,不同的因果顺序对于3D重建至关重要,然后探索几种方法以找到与任务有关的因果因素排序。我们的实验表明,潜在空间结构确实是隐式正规化,并引入了有益于重建的电感偏见。
translated by 谷歌翻译